Gaussian processes scale prohibitively with the size of the dataset. In response, many approximation methods have been developed, which inevitably introduce approximation error. This additional source of uncertainty, due to limited computation, is entirely ignored when using the approximate posterior. Therefore in practice, GP models are often as much about the approximation method as they are about the data. Here, we develop a new class of methods that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended. The most common GP approximations map to an instance in this class, such as methods based on the Cholesky factorization, conjugate gradients, and inducing points. For any method in this class, we prove (i) convergence of its posterior mean in the associated RKHS, (ii) decomposability of its combined posterior covariance into mathematical and computational covariances, and (iii) that the combined variance is a tight worst-case bound for the squared error between the method's posterior mean and the latent function. Finally, we empirically demonstrate the consequences of ignoring computational uncertainty and show how implicitly modeling it improves generalization performance on benchmark datasets.
translated by 谷歌翻译
与高斯过程(GPS)的变异近似通常使用一组诱导点来形成与协方差矩阵的低级别近似值。在这项工作中,我们相反利用了精度矩阵的稀疏近似。我们提出了差异最近的邻居高斯工艺(VNNGP),该过程引入了先验,该过程仅保留在k最近的邻居观测中的相关性,从而诱导稀疏精度结构。使用变分框架,可以将VNNGP的目标分解在观测值和诱导点上,从而以O($ k^3 $)的时间复杂性实现随机优化。因此,我们可以任意扩展诱导点大小,甚至可以在每个观察到的位置放置诱导点。我们通过各种实验将VNNGP与其他可扩展的GP进行比较,并证明VNNGP(1)可以极大地超过低级别方法,而(2)比其他最近的邻居方法较不适合过度拟合。
translated by 谷歌翻译
高斯工艺高参数优化需要大核矩阵的线性溶解和对数确定因子。迭代数值技术依赖于线性溶液的共轭梯度方法(CG)和对数数据的随机痕迹估计的迭代数值技术变得越来越流行。这项工作介绍了用于预处理这些计算的新算法和理论见解。虽然在CG的背景下对预处理有充分的理解,但我们证明了它也可以加速收敛并减少对数数据及其衍生物的估计值的方差。我们证明了对数确定性,对数 - 界限可能性及其衍生物的预处理计算的一般概率误差界限。此外,我们得出了一系列内核 - 前提组合的特定速率,这表明可以达到指数收敛。我们的理论结果可以证明对内核超参数的有效优化,我们在大规模的基准问题上进行经验验证。我们的方法可以加速训练,最多可以达到数量级。
translated by 谷歌翻译
宽度限制最近是深度学习研究的焦点:模数计算实用,做更广泛的网络优于较窄的网络?当传统网络增益具有宽度的代表性,潜在掩盖任何负面影响,回答这个问题一直在具有挑战性。我们在本文中的分析通过神经网络的概括到深层高斯过程(深GP),一类非参数分层模型,占据了神经网络的非参数分层模型。在这样做时,我们的目标是了解一旦对给定建模任务的容量足够的容量,才能了解宽度(标准)神经网络。我们深入GP的理论和经验结果表明,大宽度可能对等级模型有害。令人惊讶的是,我们证明了甚至非参数的深GP融合到高斯过程,实际上变得浅薄而没有任何代表性的力量。对应于数据适应性基本函数的混合的后后,与宽度变得较小。我们的尾部分析表明,宽度和深度具有相反的影响:深度突出了模型的非高斯,而宽度使模型越来越高斯。我们发现有一个“甜蜜点”,可以在限制GP行为防止适应性之前最大化测试性能,以宽度= 1或宽度= 2用于非参数深GP。这些结果对具有L2正规化训练的传统神经网络中的相同现象(类似于参数的高斯),使得这种神经网络可能需要多达500至1000个隐藏单元的现象,以获得足够的容量 - 取决于数据集 - 但进一步的宽度降低了性能。
translated by 谷歌翻译
归一化流量是具有易于易变量的神经网络的可逆性网络,其允许通过最大可能性优化它们的参数来有效地执行。然而,通常假设感兴趣的数据生活在嵌入在高维环境空间中的一些(通常未知)的低维歧管中。结果是自建设中以来的建模不匹配 - 可逆性要求意味着学习分布的高维支持。注射流量,从低到高维空间的映射,旨在通过学习歧管的分布来解决这种差异,但是由此产生的体积变化术语变得更具挑战性。目前方法避免完全使用各种启发式计算该术语,或者假设歧管预先已知,因此不广泛适用。相反,我们提出了两种方法来对模型的参数来促进该术语的梯度,依赖于仔细使用来自数值线性代数的自动分化和技术。两种方法都对将其投射到这种歧管上的数据执行端到端非线性歧管学习和密度估计。我们研究了我们所提出的方法之间的权衡,经验验证我们优于更准确地学习歧管和对应的相应分布忽略音量变化术语的优先级,并显示出对分布外检测的有希望的结果。我们的代码可在https://github.com/layer6ai-labs/rectangular-flows中找到。
translated by 谷歌翻译
Despite advances in scalable models, the inference tools used for Gaussian processes (GPs) have yet to fully capitalize on developments in computing hardware. We present an efficient and general approach to GP inference based on Blackbox Matrix-Matrix multiplication (BBMM). BBMM inference uses a modified batched version of the conjugate gradients algorithm to derive all terms for training and inference in a single call. BBMM reduces the asymptotic complexity of exact GP inference from O(n 3 ) to O(n 2 ). Adapting this algorithm to scalable approximations and complex GP models simply requires a routine for efficient matrix-matrix multiplication with the kernel and its derivative. In addition, BBMM uses a specialized preconditioner to substantially speed up convergence. In experiments we show that BBMM effectively uses GPU hardware to dramatically accelerate both exact GP inference and scalable approximations. Additionally, we provide GPyTorch, a software platform for scalable GP inference via BBMM, built on PyTorch.
translated by 谷歌翻译
The machine learning community has become increasingly concerned with the potential for bias and discrimination in predictive models. This has motivated a growing line of work on what it means for a classification procedure to be "fair." In this paper, we investigate the tension between minimizing error disparity across different population groups while maintaining calibrated probability estimates. We show that calibration is compatible only with a single error constraint (i.e. equal false-negatives rates across groups), and show that any algorithm that satisfies this relaxation is no better than randomizing a percentage of predictions for an existing classifier. These unsettling findings, which extend and generalize existing results, are empirically confirmed on several datasets. * Equal contribution, alphebetical order. 1 For the remainder of the paper, we will use Equalized Odds to refer to this notion of non-discrimination.
translated by 谷歌翻译
Confidence calibration -the problem of predicting probability estimates representative of the true correctness likelihood -is important for classification models in many applications. We discover that modern neural networks, unlike those from a decade ago, are poorly calibrated. Through extensive experiments, we observe that depth, width, weight decay, and Batch Normalization are important factors influencing calibration. We evaluate the performance of various post-processing calibration methods on state-ofthe-art architectures with image and document classification datasets. Our analysis and experiments not only offer insights into neural network learning, but also provide a simple and straightforward recipe for practical settings: on most datasets, temperature scaling -a singleparameter variant of Platt Scaling -is surprisingly effective at calibrating predictions.
translated by 谷歌翻译
This paper describes an evaluation of Automated Theorem Proving (ATP) systems on problems taken from the QMLTP library of first-order modal logic problems. Principally, the problems are translated to higher-order logic in the TPTP languages using an embedding approach, and solved using higher-order logic ATP systems. Additionally, the results from native modal logic ATP systems are considered, and compared with those from the embedding approach. The conclusions are that (i) The embedding process is reliable and successful. (ii) The choice of backend ATP system can significantly impact the performance of the embedding approach. (iii) Native modal logic ATP systems outperform the embedding approach. (iv) The embedding approach can cope with a wider range modal logics than the native modal systems considered.
translated by 谷歌翻译
我们报告了使用高阶自动定理掠夺(ATP)对Boolos奇怪推理变体的探索。令人惊讶的是,仍然只需要手工提供单个速记符号。计算机会自动发现所有获得简短证明所需的高阶引理。鉴于本文的观察和建议,Boolos在证明长度的加速和相关示例中的示例的全面自动化似乎是高阶ATP的范围。
translated by 谷歌翻译